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Abstract 

In this paper we establish a uniform bound for the distribution of 
a sum S n = X\ + • • • + X n of independent non-homogeneous Bernoulli 
random variables with P(JQ = 1) = Pi. Specifically, we prove that 
= i) < M where a n denotes the standard deviation of S n and 



the constant M~0.4688 is the maximum of u h-> VZue~ 2u EfcLo(^) 2 - 

1 Introduction 

The main goal of this paper is to establish the following: 

Theorem 1 Let S n = Xi+- ■ -+X n be a sum of independent non-homogeneous 
Bernoulli random variables with P(Jf, = 1) = p i; and let o n = \fYH=iPi^X~Pi) 
denote its standard deviation. Then, for all i £ Z we have 

a" F(S n =i) < M (1) 

with M = max u > \/2ue~ 2u J2h=o(ir) 2 ■ This bound is sharp and the constant 
is approximately M ~ 0.46882235549939533. 

The novelty lies in the uniform character of the bound ([1]) and the fact 
that the constant M is the best possible. This result complements the vast 
literature on bounds for large deviations probabilities — such as Markov, 
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Chebyshev, Hoeffding, and Chernoff bounds — which focus on finding sharp 
estimates for IP (IS 1 — E(S')| > t) when S is a sum of n i.i.d. random variables 
and t > 0. Such large deviations bounds [H [2l El SI El El El HOI H21 [J3] are 
very useful in probability and statistics, providing formulas of the type 

n\S-E(S)\>t)<f(nt 2 ) 

with /(0) > 1 and lim^oo f(x) = usually with an exponential decay. In 
our particular context, this implies that F(S n =i) tends to whenever i stays 
away from the mean E(S' n ), a stronger conclusion which does not follow from 
([TJ). Thus, the main addition of Theorem [1] is that it can deal with all values 
of i, including those which are close to the mean. 

This uniform bound has already proved useful for addressing two very 
different and unrelated questions: (a) to study the rate of convergence of 
Mann's iterates for non-expansive linear operators (see |15j); and (b) to give 
an approximation guarantee for an algorithm in combinatorial optimization 
(see [6]). We hope that the bound may be useful in other settings as well. 

In the rest of the paper we present the proof of Theorem 1. In §2 this 
proof is split into a series of basic steps, each one using only elementary tools 
that fit together in a surprisingly sharp way to yield the announced result. 
In the short final section §3 we discuss a simple extension of the main result 
to the case of sums and differences of Bernoullis, as well as limits of such 
variables which includes the difference of Poisson distributions. 

2 Proof of Theorem H 

We must show that for each p = (pi, . . . ,p n ) £ [0, l] n we have -R™(p) < M, 
where R%{p) = o n Pp with a n = \^Yli=iPi(^~Pi) an d 

if = P(5» = i)= J2 UjeAPj-U^-Pj)- 

AC{l,...,ra} 

I A|=i 

Usually Pp is defined only for % = 0, . . . ,n. However, P" = F(S n = i) = is 
also meaningful outside this range and allows to write the recursive formula 

p^PnP^+ii-p^pr 1 - 
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2.1 Reduction to a sum of 2 Binomial distributions 

The expressions P™, a n and Rf are continuous symmetric functions of p. We 
claim that R™(p) is maximal when the pj's take at most two values in (0, 1). 
In a different context, this result was established in [TH Vaisman] but it has 
not been published elsewhere. 

Proposition 2 For all p G [0, l] n we have 

Klip) < sup {Rl{q) : \{q 3 : < Qj < 1}| < 2}. (2) 

<je[o,i]™ 

Proof. Let Q be the set of vectors p E [0, l] n attaining the maximum of R™(p). 
Since Q is compact we may find q G Q minimal in the lexicographic order. 
We claim that \{qj : < q 3 < 1}\ < 2 from which the result follows. 

Assume by contradiction that q has 3 different entries 0<q r <q s <q t < 1. 

Denoting q = (qj)j^ r ,s,t an d q~j — 1 — (fy we have 

^ n (<?) = VrqsqtP^fiqo) + [q r q s qt + q r q s qt + q r q,qt]P£-2 '(?o) 

+ [grg s gt + grg^g* + grgsgt]Pi"~i 3 (gb) + qrMtP^iqv) 

which may be rewritten as P™(q) = F(q r , q s , q t ) where 

F(x, y, z) = Axyz + B(xy+xz+yz) + C(x+y+z) + D 

for appropriate constants A,B,C,D. Similarly a n (q) = \/V(q r , q s , q t ) with 

V(x,y, z) = x{l-x) + y(l-y) + z(l-z) + a n - 3 (q ) 2 . 

Since the maximum of R™(-) = <r n (-)P™(-) is attained at q, it follows that 
\/V(-)F(-) is maximal at (q r ,q s ,q t ) G int([0, l] 3 ), and therefore its gradient 
vanishes at this point, namely 

y/V(q r ,qs,qt)VF(q r , q s , q t ) + /^ Qt) VV(q r , q s , q t ) = 0, 

so that setting A = — this gives explicitly 

Aq s q t + B(q s + q t ) + C = A(l - 2q r ) 
Aq r q t + B(q r + q t )+C = A(l - 2q s ) 
Aq r q s + B{q r + q s ) + C = \(l-2q t ). 
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Substracting the first two equations and simplifying by (q s — q r ) ^ we get 
Aq t + B = 2A. Similarly, the second and third equations combined yield 
Aq r + B — 2A, so that Aq t = Aq r and since q r ^ q t we conclude A = 0. 

Now, A = implies that the function F(-) depends only on the values of 
x + y + z and xy + xz + yz, while the same holds for V(-) since 

x(l— x) + y(l — y) + z(l — z) = (x+y+z) — (x+y+z) 2 + 2(xy+xz+yz). 

Thus, \JV(-)F(-) is constant over the set defined by x + y + z = q r + q s + qt 
and xy+xz+yz = q r q s +q r <lt+<ls<lt, and therefore any such vector (x, y,z,q Q ) 
maximizes -R™(-)- Since q G Q is lexicographically minimal, it follows that 
q r < x and therefore the triple (q r , q s , q t ) also solves 

{min x 

s.t. (x,y,z) G [0,1] 3 

x + y + z = q r + q s + qt 

xy + xz + yz = q r q s + q r q t + q s q t 

Since q r , q s , q t are different, the gradients of the two equality constraints at 
this optimal point are linearly independent, while the inequality constraints 
are non-binding. Hence the Mangasarian-Fromovitz constraint qualification 
holds and we may find Lagrange multipliers a and (3 to write down the 
following necessary optimality conditions 

1 = a + (3(q s + q t ) 
= a + (3(q r + q t ) 
= a + (3(q s + q t ). 

Since q r ^ q s the two last equations imply a = (3 = 0, which is incompatible 
with the first equation. This contradiction shows that the constant A cannot 
be 0, and therefore the assumption < q r < q s < q t < 1 was absurd. □ 

Removing all qj G {0, 1} which correspond to deterministic variables Yj, 
the Lemma shows that it suffices to prove ([1]) for pj's taking at most two 
values, that is to say, for S n = U + V with U ~ B(a,\) and V ~ B(b,/i) 
independent Binomials. More specifically, denoting B%(p) = (™)p k (l—p) n ~ k 
the Binomial probabilities and defining the constant 

i 

M= sup y/aX(l-X) + b(jL(l-fj.) Y, B t^)BlM (3) 

A,M6[0,1] 
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we have the following sharp estimate 

Corollary 3 For allnEN, pE [0, 1]" and i E Z we have B%{p) < M. 

2.2 An upper bound for the constant M 

Before computing the optimal constant M, we establish a simpler upper 
bound. As a consequence of this analysis it turns out that considering two 
Binomial distributions in the supremum (J3j) is essential. 

Lemma 4 M < 4=. 

— 

Proof. Let a,b,i E N and A, \i E [0, 1], and assume without loss of generality 
that bfi(l — fi) < a\(l — A). Since the probabilities Bf_ k (/j,) are non-negative 
and add up at most 1, the sum in may be bounded from above by the 
maximal B%(\), so that M < will follow if we prove the inequality 

V2aA(l-A) Bt{\)<± 

k+- 

For a and k given, the maximum in A E [0, 1] is attained at A = ^xf- , so that 
replacing this value all we must show is that 

a\ V2a~(k + \) k+1 2{a — k + \y- k +\ ^ 



k) (a + l) a+1 ~ v^' 

Let C% denote this expression. We claim that its maximum over k is attained 
at the extreme values k = and k = a. Indeed, the quotient between C% +1 
and C% (for 0<k<a— 1) may be expressed as 

C k+i _ H(a-k) 
CI H(k+1) 

-- +- 

where H(x) = x(x — \) x 2 /(x + \) x 2. A simple calculus exercise shows 
that H(-) is decreasing so that H(a — k) > H(k + 1) iff a — k < k + 1. This 
implies that C% decreases for k < 2== and increases afterwards, so that its 
maximum is attained at k = or k = a. The conclusion follows since 

rja _ r a _ V^(a+^) a+ i _ nT ( _ 1 V +1 < py:r .(_l\ _ J_ 

Remark. The proof above shows that for a single binomial distribution we 
have y/a\(l-\)B%(\) < -j- ~ 0.428. Since we will prove that M ~ 0.4688, 
it follows that allowing A 7^ fi in Q is essential. 
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2.3 Exact value of the constant M 

Let ip n (p) = Y^k=o E>k (p) 2 ■ Considering the expression in the maximum (j3J) 
and using the Cauchy-Schwartz inequality we get 

M < sup y/aX(l-X) + 6/x(l-/x) y/ip a (X) ^ b (l^)- 
a,ben 
A,/tte[o,i] 

We may restate this supremum using the change of variables x = a\(l — A) 
and y = bfi(l—fi), so that 

A = i(l± y/l-Ax/a), 
H = |(1 ± y/l-4y/b). 

Then, defining (p n (u) = t/j n (p n (u)) with0p n ( , u) = |(1 — a/1 — Au/n), we may 
rewrite the previous bound as 

M < sup \/x + y \/<p a (x) p b (y). (A) 

a, b £ N 
0<x<a/4 
< 1/ < b/4 

The key property for our subsequent analysis is the following. 
Proposition 5 (p n (u) increases with n forn > Au. 

Proof. We begin by observing that ip n (p) = P(C7 = V) where U and V are 
independent Binomials B(n,p). Indeed, conditioning on V we get 

f(u=v) = Y2=^(u=v\v=k)f>(y=k) 

= E n k=0 nu=k)nv=k) 

Let 7Tfc = F(U — V = k) and consider the ^-transform of U — V, that is 

This map is holomorphic on C \ {0} with a pole of degree n at the origin and 
constant coefficient tc q = ip n (p). Thus, integrating the function £(z)/z along 
the unit circle C we get 

r (P) = = h fe dz = ± C^ e ) dO. (5) 

1 The same value (p n (u) is obtained if we take the other root — 4u/n). 
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The function £ (z) may be explicitly computed by expressing U = Yli=i 
and ^ = Xir=i as sums of independent Bernoullis of parameter p, namely 

m = E[nr=i^-nr=i^ y< ] 
= [p+(i_ P )«]».[p+(i-p) Z -i]» 

= [p 2 + (l-p) 2 +p(l-p)(z + ^ 1 )] n 

from which it follows 

£(e ie ) = [p 2 + (l-pf + 2p{l-p) cos6] n . 
Now, evaluating at p — p n (u), a simple calculation yields 

£(e ie ) = [1 — 2 "( 1 ~ cos9 ) ] n 
which plugged into (|SJ) gives the formula 

= V> n (P») = i C[l - **2=p&]»dB. (6) 

Since n > 4u > 2u(l—cos8), the expression under the integral sign increases 
with respect to n, from which the conclusion follows. □ 

This result implies that the supremum in (j3J) is attained for a,b — > oo, 
that it to say 

M < M = sup y/xTy vV°0) ip°°(y) (7) 

x,j/>0 

where <p°°(u) = lim n _+oo (p n (u). Letting n — > oo in (jfJJ) we get an explicit 
expression for this limit function 

<^ ( u ) = A. e -2 U (i-c OS o) de (g) 
Alternatively, we may pass to the limit directly in the original expression 

^» = ELoW(«)) 2 

and since it is easily seen that np n (u) — > it, the Binomial distribution con- 
verges to a Poisson and we get the series representation 

^(«) = 5Xo(t) 2 = e- 2u £r=o(S ) 2 - (9) 

Clearly, the latter could have also been obtained by a series expansion of (EJ). 
We proceed next to show that the supremum in is attained. 
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Proposition 6 The supremum M is attained at a unique point (x*,y*) with 
x* = y* = u > 0, where u is the optimal solution of M = sup u>0 M(u) with 



-2uS^°° (u k \2 



M(u) = V2u(p°°(u) = V2ue- 2u Zt= vvJ ■ 
Proof. Denoting h(u) = hKp°°(u), it is clear that solving ([7]) is equivalent to 

sup \n(x + y) + h(x) + h(y). (10) 

x,y>0 

We claim that h(-) is strictly concave, so that this problem has at most one 
optimal solution. We will then prove that there is a point u G (0, 1) that 
maximizes M(u) and therefore it satisfies ^= + h'(u) = 0, from which it follows 
at once that the point (x*,y*) with x* = y* = u is a stationary point for (flOj) 
and is therefore the unique optimal solution. 

h(-) is strictly concave. For each u > the function 

Pu (e) = _i e- 2 ^ 1 -™ 89 ) 

ru.\ ) 2tx% >°°(u) 

defines a probability density on the interval [0, 27r]. Moreover, using ([8]), a 
direct computation allows to express the second derivative h"{u) as 

h"(u) = -A[ (1 - cos 9) 2 Pu (9) d9 - (J^ (1- cos 9) p u (9) d9) 2 ] < 0, 

where the last inequality follows from Jensen's inequality applied to the 
strictly convex function zhz 2 . 

Existence of a unique maximizer u for M(u). The strict concavity of h(-) 
implies that M(-) is strictly log-concave. Since M(0) = and M{u) > for 
u > 0, the existence of a unique maximizer «6 (0, 1) will follow if we prove 
that M'(l) < 0. Now, using the expression (Q we may readily compute 
M'(u) which evaluated at u — 1 gives 

M'(1) — \a V°° i 3 V 00 

m \ L ) — ,/2 ^Z^fc=0 fc!(fc+l)! °Z^fc=0(fe!) 2 



The first two terms of both sums cancel out, and then we conclude 
M'(l) = ^£r= 2 (4bTT-i] <0 - 



□ 
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Remark. Numerically we find that u ~ 0.39498892975658451 which gives 
the approximate value for M ~ 0.46882235549939533. 

Remark. We observe that EfcLo(ir) 2 = -8(0, 2u) where B(0, •) is a modified 
Bessel function of the first kind, namely the solution of x 2 y" + xy' — x 2 y = 0. 
Thus, it seems unlikely to find simpler formulas for M or u in terms of known 
constants and functions. 

In order to complete the proof of Theorem 1, it remains to show that 

Proposition 7 M = M and the bound (CQ) is sharp. 

Proof. Both conclusions will follow simultaneously if we exhibit a family of 
S n, s and z's for which a n F(S n = i) tends to M. To this end let us consider 
for S n the particular case of a sum of n = 2a Bernoullis, half of which have 
Pi = A while for the other half we take Pi — \i — 1— A, with A = u/a. Hence, 
S n = U + V with U ~ B(a, A) and V ~ B(a, 1 — A) independent Binomials. 
If we now take i = a, we get 

a n F(S" = a) = y/2aX(l-X) ELo L-k)^ 1 ~ 

= V 2 ^ 1 -!) ELo G) 2 (f ) 2fc (i - !) 2(a - fe) 

which is easily seen to converge towards M(-u) = M when a — > oo. □ 



3 A simple extension and a final comment 

As a straightforward corollary of Theorem 1 we get that the bound ([T]) still 
holds for any random variable S n = EILi that can be expressed as sums 
and differences of non-homogeneous independent Bernoullis. Moreover, the 
bound will remain true for limits of such variables, so that it holds in fact for 
all combinations of sums and differences of Bernoulli, Binomial, and Poisson 
variables. We record the purely Poissonian case in the next 

Corollary 8 Let S = X — Y with X ~ V(x) and Y ~ V(y) two independent 
Poisson random variables. Then for alii eN we have 

a s W(S = t) < M. 

The bound M is attained with equality when x = y = u and i — 0. 
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Proof. The upper bound follows from the previous analysis. For the last 
claim we notice that when x = y = u and % = we have 

a s F(S = 0) = Er=o(e- n £) 2 = M(u) 

so that the bound M is attained with equality if u = u. □ 

Remark. In the case of a single Poisson variable X ~ V(x) (i.e. y = 0), 
which is obtained as limit of Binomials B(n, -), the remark after Lemma H] 
implies the stronger and sharp estimate ax^(X = i) = ^fxe~ x K^ < 

A natural question arising from these observations, and which seems to 
be open, is to characterize the class of distributions that can be obtained 
as limits of sums and differences of Bernoullis. A fundamental result of 
Kintchine [11] (see also the classical book by Gnedenko and Kolmogorov 
[8j Theorem 2, p. 115]) characterizes those distributions that are obtained 
as limits of sums of independent random variables. The latter may or may 
not be Bernoullis though, so that this general result provides only necessary 
conditions for our more specific question. 
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